Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-temporal stores: use inline assembly #1541

Merged
merged 3 commits into from
Jun 21, 2024
Merged

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Feb 25, 2024

LLVM treats !nontemporal as just a hint on store operations, which is unsound -- they have a totally different semantics, similar to atomic memory orderings. So I'd like to avoid any risk of that causing any issues by entirely avoiding their !nontemporal attribute. Is it acceptable to use inline assembly to implement these intrinsics?

Note that this is my first time ever writing inline assembly, so the code may or may not make any sense.^^

@rustbot
Copy link
Collaborator

rustbot commented Feb 25, 2024

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@Amanieu
Copy link
Member

Amanieu commented Feb 25, 2024

My understanding is that LLVM can turn a nontemporal store into a normal one, but not the other way around. This seems to be fine as far as I understand.


The CI failure happens because the target_feature attribute only enables sse2 and rustc isn't smart enough to figure out that this implies sse (only LLVM knowns that). You fix it by enabling the sse feature as well.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 25, 2024

My understanding is that LLVM can turn a nontemporal store into a normal one, but not the other way around. This seems to be fine as far as I understand.

It's completely unclear. LangRef talks about it as a hint:

The optional !nontemporal metadata must reference a single metadata name <nontemp_node> corresponding to a metadata node with one i32 entry of value 1. The existence of the !nontemporal metadata on the instruction tells the optimizer and code generator that this load is not expected to be reused in the cache. The code generator may select special instructions to save cache bandwidth, such as the MOVNT instruction on x86.

That would mean the flag can be added or removed arbitrarily ("this load is not expected to be reused in the cache" -- but no semantic constraints or anything). But that's clearly wrong. LLVM doesn't acknowledge in the slightest the extra UB that can be caused by non-temporal stores (llvm/llvm-project#64521). Therefore I have zero confidence that anyone thought about how !nontemporal interacts with all the LLVM passes that work on load (almost all of which probably just ignore the attribute entirely). I'm not even aware of any cross-platform memory model with support for nontemporal stores that they could be using here -- and they clearly need a cross-platform memory model since they are doing optimizations in the context of a C++11-style model.

@RalfJung
Copy link
Member Author

SDE ERROR:  TID: 1064 executed instruction with an unaligned memory reference to address 0x7f27229035e0 INSTR: 0x562d8a5e21f3: IFORM: VMOVNTPS_MEMf32_ZMMf32_AVX512 :: vmovntps zmmword ptr [rax], zmm0
	IMAGE:    /checkout/target/x86_64-unknown-linux-gnu/release/deps/core_arch-59198cd2fc79a24a
	FUNCTION: _ZN9core_arch9core_arch3x867avx512f5tests20test_mm512_stream_ps20test_mm512_stream_ps17hb7f0b28acc824410E.llvm.13799798511543115899
	FUNCTION ADDR: 0x562d8a5e21c0

Hm, yes, this requires alignment, but that shouldn't be new...?

struct Memory {
pub data: [f32; 16],
pub data: [f32; 16], // 64 bytes
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should have failed many times already. The only explanation I have for why that did not happen is that maybe LLVM optimizes away the entire test...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the whole stack frame gets 64-byte aligned, since there are __m512 values involved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that happened it would also happen with this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stack frame is probably different because you are missing the nostack option on the inline assembly. In fact these assembly blocks should include both nostack and preserves_flags.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure the test was just buggy before this PR. It doesn't actually ensure that the data is properly aligned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact these assembly blocks should include both nostack and preserves_flags.

I have added the options in the last commit.

@RalfJung
Copy link
Member Author

I think I found where LLVM defines the x86 intrinsics: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/IntrinsicsX86.td.

I found nothing with "stream" in the name, and the only "movnt" is int_x86_mmx_movnt_dq, probably accessible via llvm.x86.mmx.movnt.dq, which I assume is not the right thing.

There seem to be already quite a few asm! in stdarch so I guess using that here is acceptable? IMO it's better than just using normal loads since presumably people actually want the streaming semantics when using this operation.

@Amanieu Amanieu enabled auto-merge (rebase) June 21, 2024 14:38
@Amanieu Amanieu merged commit fd5fc64 into rust-lang:master Jun 21, 2024
29 checks passed
@RalfJung RalfJung deleted the movnt branch June 21, 2024 15:17
@RalfJung
Copy link
Member Author

Awesome, thanks. :)

After the next stdarch bump we can then remove the intrinsic from rustc.

@Amanieu
Copy link
Member

Amanieu commented Jun 21, 2024

The intrinsic is only broken on x86, it still has value on other targets.

@RalfJung
Copy link
Member Author

Hm, fair. Maybe we should then document the intrinsic as "it is semantically equivalent to a regular load, just a hint", and on x86 actually compile it to just a load since that architecture doesn't have a "just a hint" version of this. For all other architectures we'd have to check whether what LLVM does there is sensible or not.

@RalfJung
Copy link
Member Author

@Amanieu any chance we could get a stdarch bump in the rustc repo that includes this change? :)

@Amanieu
Copy link
Member

Amanieu commented Jul 13, 2024

We're waiting on a bootstrap bump that should happen next week.

@sayantn sayantn mentioned this pull request Aug 3, 2024
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Aug 12, 2024
…ouxu,Amanieu,Jubilee

nontemporal_store: make sure that the intrinsic is truly just a hint

The `!nontemporal` flag for stores in LLVM *sounds* like it is just a hint, but actually, it is not -- at least on x86, non-temporal stores need very special treatment by the programmer or else the Rust memory model breaks down. LLVM still treats these stores as-if they were normal stores for optimizations, which is [highly dubious](llvm/llvm-project#64521). Let's avoid all that dubiousness by making our own non-temporal stores be truly just a hint, which is possible on some targets (e.g. ARM). On all other targets, non-temporal stores become regular stores.

~~Blocked on rust-lang/stdarch#1541 propagating to the rustc repo, to make sure the `_mm_stream` intrinsics are unaffected by this change.~~

Fixes rust-lang#114582
Cc `@Amanieu` `@workingjubilee`
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Aug 12, 2024
Rollup merge of rust-lang#128149 - RalfJung:nontemporal_store, r=jieyouxu,Amanieu,Jubilee

nontemporal_store: make sure that the intrinsic is truly just a hint

The `!nontemporal` flag for stores in LLVM *sounds* like it is just a hint, but actually, it is not -- at least on x86, non-temporal stores need very special treatment by the programmer or else the Rust memory model breaks down. LLVM still treats these stores as-if they were normal stores for optimizations, which is [highly dubious](llvm/llvm-project#64521). Let's avoid all that dubiousness by making our own non-temporal stores be truly just a hint, which is possible on some targets (e.g. ARM). On all other targets, non-temporal stores become regular stores.

~~Blocked on rust-lang/stdarch#1541 propagating to the rustc repo, to make sure the `_mm_stream` intrinsics are unaffected by this change.~~

Fixes rust-lang#114582
Cc `@Amanieu` `@workingjubilee`
bjorn3 pushed a commit to rust-lang/rustc_codegen_cranelift that referenced this pull request Aug 15, 2024
…ieu,Jubilee

nontemporal_store: make sure that the intrinsic is truly just a hint

The `!nontemporal` flag for stores in LLVM *sounds* like it is just a hint, but actually, it is not -- at least on x86, non-temporal stores need very special treatment by the programmer or else the Rust memory model breaks down. LLVM still treats these stores as-if they were normal stores for optimizations, which is [highly dubious](llvm/llvm-project#64521). Let's avoid all that dubiousness by making our own non-temporal stores be truly just a hint, which is possible on some targets (e.g. ARM). On all other targets, non-temporal stores become regular stores.

~~Blocked on rust-lang/stdarch#1541 propagating to the rustc repo, to make sure the `_mm_stream` intrinsics are unaffected by this change.~~

Fixes rust-lang/rust#114582
Cc `@Amanieu` `@workingjubilee`
antoyo pushed a commit to rust-lang/rustc_codegen_gcc that referenced this pull request Oct 9, 2024
…ieu,Jubilee

nontemporal_store: make sure that the intrinsic is truly just a hint

The `!nontemporal` flag for stores in LLVM *sounds* like it is just a hint, but actually, it is not -- at least on x86, non-temporal stores need very special treatment by the programmer or else the Rust memory model breaks down. LLVM still treats these stores as-if they were normal stores for optimizations, which is [highly dubious](llvm/llvm-project#64521). Let's avoid all that dubiousness by making our own non-temporal stores be truly just a hint, which is possible on some targets (e.g. ARM). On all other targets, non-temporal stores become regular stores.

~~Blocked on rust-lang/stdarch#1541 propagating to the rustc repo, to make sure the `_mm_stream` intrinsics are unaffected by this change.~~

Fixes rust-lang/rust#114582
Cc `@Amanieu` `@workingjubilee`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants